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Abstract. We introduce estimation and test procedures through divergence minimiza- 
tion for models satisfying linear constraints with unknown parameter. These procedures 
extend the empirical likelihood (EL) method and share common features with generalized 
empirical likelihood approach. We treat the problems of existence and characterization 
of the divergence projections of probability distributions on sets of signed finite mea- 
sures. We give a precise characterization of duality, for the proposed class of estimates 
and test statistics, which is used to derive their limiting distributions (including the EL 
estimate and the EL ratio statistic) both under the null hypotheses and under alterna- 
tives or misspecification. An approximation to the power function is deduced as well as 
the sample size which ensures a desired power for a given alternative. 
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1. Introduction and notation 
Statistical models are often defined through estimating equations 

E[g(X,9)] = 0, 

where E[-] denotes the mathematical expectation, g := . . . , gi) T G M. 1 is some specified 
vector valued function of a random vector X G lR m and a parameter vector 9 G C M d . 
Examples of such models are numerous, see e.g. Qin and Lawless (1994), Haberman 
(1984), Sheehy (1987), McCullagh and Nelder (1983), Owen (2001) and the references 
therein. Denoting M l the collection of all probability measures (p.m.) on the measurable 
space (IR m , £>(IR m )), the submodel M.\, associated to a given value 9 of the parameter, 
consists of all distributions Q satisfying I linear constraints induced by the vector valued 
function g(.,9), namely 

Ml := |<5 G M 1 such that J g(x, 9) dQ(x) = j , 

with / > d. The statistical model which we consider can be written as 

M'-^ljMl. (1.1) 

6»G0 

Let Xi, ...,X n denote an i.i.d sample of X with unknown distribution P . We denote 9 , 
if it exists, the value of the parameter such that Pq belongs to .ML, namely the value 
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satisfying 

E[g(X,9 )]=0, 

and we assume obviously that 9q is unique. This paper addresses the two following natural 
questions: 

Problem 1: Does Pq belong to the model 

Problem 2: When Pq is in the model, which is the value 9q of the parameter for which 
E [g(X, 9q)] = 0? Also can we perform tests about 9q1 Can we construct confidence areas 
for 9 Q 1 

We note that these problems have been investigated by many authors. Hansen (1982) 
considered generalized method of moments (GMM). Hansen et al. (1996) introduced the 
continuous updating (CU) estimate. The empirical likelihood (EL) approach, developed 
by Owen (1988) and Owen (1990), has been investigated in the context of model (1.1) by 
Qin and Lawless (1994) and Imbens (1997) introducing the EL estimate. The recent lit- 
erature in econometrics focusses on such models; Smith (1997), Newey and Smith (2004) 
provided a class of estimates called generalized empirical likelihood (GEL) estimates which 
contains the EL and the CU ones. Schennach (2007) discussed the asymptotic proper- 
ties of the empirical likelihood estimate under misspecification; the author showed the 
important fact that the EL estimate may cease to be root n consistent when the func- 
tions <7j defining the moments conditions and the support of Pq are unbounded. Among 
other results pertaining to EL, Newey and Smith (2004) stated that EL estimate enjoys 
optimality properties in term of efficiency when bias corrected among all GEL estimates 
including the GMM one. Moreover, Corcoran (1998) and Baggerly (1998) proved that 
in a class of minimum discrepancy statistics (called power divergence statistics), EL ra- 
tio is the only one that is Bartlett correctable. Confidence areas for the parameter 9q 
have been considered in the seminal paper by Owen (1990). Problems 1 and 2 have 
been handled via EL and GEL approaches in Qin and Lawless (1994), Smith (1997) and 
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Newey and Smith (2004) under the null hypothesis "Ho : Po G -M 1 ; the limiting distri- 
butions of the GEL estimates and the GEL test statistics have been obtained under the 
model and under the null hypotheses. Imbens (1997) discusses the asymptotic properties 
of the EL and exponential tilting estimates under misspecification and give the formula of 
the asymptotic variance, using dual characterizations, without presenting the hypotheses 
under which their results hold. Chen et at (2007) give the limiting distribution of the 
EL estimate under misspecification as well as the EL ratio statistic between a parametric 
model and a moment condition model. The paper by Kitamura (2007) gives a discussion 
of duality for GEL estimates under moment condition models. Bertail (2006) uses duality 
to study, under the model, the asymptotic properties of the EL ratio statistic and its 
Bartlett correctability; the author extends his results to semiparametric problems with 
infinite-dimensional parameters. 

The main contribution of the present paper is the precise characterization of duality for 
a large class of estimates and test statistics (including GEL and EL ones) and its use 
in deriving the limiting properties of both the estimates and the test statistics under 
misspecification and under alternatives hypotheses. Moreover, 



1) The approach which we develop is based on minimum discrepancy estimates, which 
extends the EL method and has common features with minimum distance and 
GEL techniques, using merely divergences. We present a wide class of estimates, 
test statistics and confidence regions for the parameter #o as well as various test 
statistics for Problems 1 and 2, all depending on the choice of the divergence. 

2) The limiting distribution of the EL test statistic under the alternative and under 
misspecification remains up to date an open problem. The present paper fills this 
gap; indeed, we give the limiting distributions of the proposed estimates and test 
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statistics (including the EL ones) both under the null hypotheses, under alterna- 
tives and under misspecification. 

3) The limiting distributions of the test statistics under the alternatives and misspec- 
ification are used to give an approximation to the power function and the sample 
size which ensures a desired power for a given alternative. 

4) We extend confidence region (C.R.) estimation techniques based on EL (see Owen 
(1990)), providing a wide range of such C.R.'s, each one depending upon a specific 
divergence. 

From the point of view of the statistical criterion under consideration, the main advan- 
tage, of using a divergence based approach and duality, lays in the fact that it leads to 
asymptotic properties of the estimates and test statistics under the alternative, includ- 
ing misspecification, which cannot be achieved through the classical EL context. In the 
case of parametric models of densities, White (1982) studied the asymptotic properties of 
the parametric maximum likelihood estimate and the parametric likelihood ratio statis- 
tic under misspecification; Keziou (2003) and Broniatowski and Keziou (2009) stated the 
consistency and obtained the limiting distributions of the minimum divergence estimates 
and the corresponding test statistics (including the parametric likelihood ones) both un- 
der the null hypotheses and the alternatives, from which they deduced an approximation 
to the power function. In this paper, we extend the above results for the proposed class 
of estimates and test statistics (including the EL ones) in the context of semiparametric 
models (1.1). 

The rest of the paper is organized as follows. Section 2 describes the statistical divergences 
used in the sequel. Section 3 is devoted to the description of the proposed estimation and 
test procedures. In Section 3, we adapt the Lagrangian duality formalism to the context 
of statistical divergence, and we use it to give practical formulas (for the study and the 
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1.2 



numerical computation) of the proposed estimates and test statistics. Section 5 deals 
with the asymptotic properties of the estimates and the test statistics under the model 
and under misspecification. Simulations results are given in Section 6. All proofs are 
postponed to the Appendix. 



We first set some general definitions and notations. Let P be some p.m. on the measurable 
space (IR m , £>(M m )). Denote by M the space of all signed finite measures (s.f.m.) on 
(M m , B(M. m )). Let tp be a convex function from M onto [0, +oo] with <p(l) = 0, and such 
that its domain, dom<£> := {x £ M. such that <p(x) < oo} =: (a,b), is an interval, with 
endpoints a < 1 < b, which may be bounded or unbounded, open or not. We assume that 
if is closed 1 . For any s.f.m. Q £ M, the (^-divergence between Q and the p.m. P, when 
Q is absolutely continuous with respect to (a.c.w.r.t) P, is defined through 



in which ^p(-) denotes the Radon-Nikodym derivative. When Q is not a.c.w.r.t. P, we 
set D^(Q,P) : = +oo. For any p.m. P, the mapping Q £ M i— > D^Q.P) is convex and 
takes nonnegative values. When Q = P then D V (Q, P) = 0. Furthermore, if the function 
x i — y (p(x) is strictly convex on a neighborhood of x = 1, then 



All the above properties are presented in Csiszar (1963), Csiszar (1967) and in Chapter 1 
of Liese and Vajda (1987), for (^—divergences defined on the set of all p.m.'s M 1 . When 
the (^-divergences are extended to M, then the same arguments as developed on M 1 hold. 
When defined on M 1 , the Kullback-Leibler (KL), modified Kullback-Leibler (KL m ), \ 2 i 
modified \ 2 (%m)> Hellinger (if), and L 1 divergences are respectively associated to the 

1 The closcdncss of ip means that if a or 6 are finite then <p(x) — > <f{a) when x \. a, and tp(x) — > fib) 
when x j" b. Note that, this is equivalent to the fact that the level sets {x £ R; (p(x) < a}, Va £ R, are 
closed in R endowed with the usual topology. 



2. Statistical divergences 




(2.1) 



D V (Q, P)=0 if and only if Q = P. 



(2.2) 
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convex functions tp(x) = xlogx — x + 1, tp(x) = — logx + x — 1, <p(x) = \{x — l) 2 , 
tp(x) = |(x — l) 2 /x, <p(x) = 2(y / x— l) 2 and tp(x) = \x — 1|. All these divergences 
except the L l one, belong to the class of the so called power divergences introduced 
in Cressie and Read (1984) (see also Liese and Vajda (1987) and Pardo (2006)). They 
are defined through the class of convex functions 

x G R; h- tp 7 (x) := X 2JZ^±J^1 (2.3) 

7(7-1) 

if 7 G K \ {0,1}, (po(x) := —logx + x — 1 and <fi(x) : = xlogx — x + 1. So, the 
JiL— divergence is associated to cpi, the KL m to <po, the x 2 to <^ 2 , the xL ^° P-i an< i the 
Hellinger distance to ^1/2 ■ We extend the definition of the power divergences functions 
Q G M 1 1 — y D^lQjP) onto the whole set of signed finite measures M as follows. When 
the function x H- tp^(x) is not defined on ] — 00, 0[ or when (p 1 is defined on M but is not 
convex, we extend the definition of y> 7 as follows 

x G R ^ ^ 7 (x)l[ , + oo[(a;) + (+oo)l ] _ oo>0[ (x). (2.4) 

Note that for x 2 -divergence, the corresponding <p function <p(x) = \{x — l) 2 is convex 
and defined on whole K. In this paper, for technical considerations, we assume that the 
functions tp are strictly convex on their domain (a, b), twice continuously different iable on 
}a, b[, the interior of their domain. Hence, <p'(l) = 0, and for all x G]a, b[, <f"(x) > 0. Here, 
ip' and tp" are used to denote respectively the first and the second derivative functions of tp. 
Moreover, we assume that tp is "essentially smooth" in the sense that hm^ tp'(x) = — oo 
if a is finite and lim^ = +oo if b is finite. Note that the above assumptions on 
tp are not restrictive, and that all the power functions y? 7 , see (2.4), satisfy the above 
conditions, including all standard divergences. 

Definition 2.1. Let Q be some subset of M. The tp— divergence between the set Q and a 
p.m. P is defined by 

D v {n,P) := inf D V (Q,P). 
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A finite measure Q* G Q, such that D^Q*^) < oo and 

D V (Q*,P)<D V (Q,P) for all Q e tt, 

is called a projection of P on Q. This projection may not exist, or may be not defined 
uniquely. 

3. Minimum divergence estimates 

Let Xi, ...,X n denote an i.i.d. sample of a random vector X e M. m with distribution Pq. 
Let P n be the empirical measure pertaining to this sample, namely 



n 

P n : 



1 n 

1=1 

where S x denotes the Dirac measure at point x, for all x. We will endow our statistical 
approach in the global context of s.f.m's with total mass 1 satisfying I linear constraints: 



M e := IQ e M such that / dQ{x) = 1 and / g(x,6) dQ(x) = \ (3.1) 

and 

M:=\jM , (3.2) 

eee 

sets of signed finite measures that replace }A\ and Ad 1 . Enhancing the model (1.1) to 
the above one (3.2) bears a number of improvements upon existing results; this is argued 
at the end of the present Section; see also Remark 4.5 below. The "plug-in" estimate of 
D v (M ,Po) is 

D v (M 9 ,P ):=MD v (Q,P n )= m£ [ <p ( ^(x)) dP n {x). (3.3) 

If the projection of P n on Aie exists, then it is clear that is a s.f.m. (or possibly 
a p.m.) a.c.w.r.t. P n ; this means that the support of must be included in the set 
{X 1 , . . . , X n }. So, define the sets 

Q e M | Q a.c.w.r.t. P n , ^Q(X,) = 1 and ^Q{X i )g{X i ,B) = \ , (3.4) 

i=i i=i ) 
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which may be seen as subsets of W 1 . Then, the plug-in estimate (3.3) can be written as 

1 n 

D v (Me,P )= inf -J2<P(pQ( x i))- ( 3 - 5 ) 

In the same way, D^M, P ) := inf^e irdQ^ Mg D^(Q, P ) can be estimated by 

1 n 

D^(M,P ):=M inf -5>(nQ(X,)). (3.6) 

By uniqueness of arginf^ge D^,(Me, Pq) and since the infimum is reached at 9 — 9q under 
the model, we estimate 9 through 

1 n 

^:=arginf inf -J^ <p(nQ(Xi)) . (3.7) 

Enhancing M 1 to M and accordingly extensions in the definitions of the ip functions on 
K. and the ^-divergences on the whole space of s.f.m's M, is motivated by the following 
arguments: 

- If the domain (a, b) of the function tp is included in [0, +oo[ then minimizing over 
.M 1 or over Ai leads to the same estimates and test statistics. It is the case of the 
KL m , KL, modified x 2 an d Hellinger divergences. 

- Let 9 be a given value in 0. Denote Qf ,n ^ and respectively, the projection 



of P n on Ml and on M e . If Qf' n) satisfies < Qf n \Xi) < 1, for all z = 1 



n, 



then = ■ Therefore, in this case, both approaches leads also to the same 

estimates and test statistics. 
- It may occur that for some 9 in and some i — 1, . . . , n, Qg (Xi) is a boundary 
value of [0,1], hence the first order conditions are not met which makes a real 
difficulty for the calculation of the estimates over the sets of p.m. Ml and M 1 . 
However, when M 1 is replaced by M, then this problem does not hold any longer 
in particular when dom^ = R, which is the case for the x 2 -divergence. Other 
arguments are given in Remark 4.5 below. 

The empirical likelihood paradigm (see Owen (1988), Owen (1990), Qin and Lawless 
(1994) and Owen (2001)), enters as a special case of the statistical issues related to 
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estimation and tests based on ^—divergences with (p(x) = fo( x ) '■= — logx + x — 1, 
namely on KL m — divergence. Indeed, it is straightforward to see that the empirical log- 
likelihood ratio statistic for testing "H '■ Pq ^ M. against Hi : Po ^ in the context of 
(^-divergences, can be written as 2nDxi m (Ai, P ); and that the EL estimate of 9 can be 
written as 9xL m = argfnfeee DKL m (-M-e, Po)', see Remark 4.3 below. In the case of the 
power functions tp = ip 1: the corresponding estimates (3.7) belong to the class of GEL es- 
timates introduced by Smith (1997) and Newey and Smith (2004), and (3.5) in this case 
are the empirical Cressie-Read statistics introduced by Baggerly (1998) and Corcoran 
(1998); see Remark 4.4 below. 

The constrained optimization problems (3.5), (3.6) and (3.7) can be transformed into 
unconstrained ones making use of some arguments of "duality" which we briefly state 
below from Rockafellar (1970). On the other hand, the obtaining of asymptotic statistical 
results of the estimates and the test statistics, under misspecification or under alternative 
hypotheses, requires handle existence conditions and characterization of the projection of 
Po on the submodel Aig or on the model Ai. This also will be considered through duality, 
along the following Section. 



4. Dual representation of ^-divergences under constraints 

This Section is central for our purposes. Indeed, it provides the explicit form of the 
proposed estimates by transforming the constrained problems (3.5) to unconstrained ones, 
using Lagrangian duality which is a classical tool in optimization theory. This Section 
adapts this formalism to the context of divergences and the present statistical setting. 
The Lagrangian "dual" problem, corresponding to the "primal" one 



(4.1) 
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and its empirical counterpart (3.5), make use of the so-called Fenchel-Legendre transform 
of f, defined through 

■0 : t G R ip(t) := sup {tx — <p(x)} . (4.2) 
The "dual" problems associated to (4.1) and (3.5) are respectively 

sup (t - / Ht + y^t jgj {x, 9)) dP (x)\ , (4.3) 



tm 1+ 

and 



i=i 



sup \t --^il>(t + J2t j g j (X i ,e))\. (4.4) 

In the following Propositions 4.1 and 4.2, we state sufficient conditions under which the 
primal problems (4.1) and (3.5) coincide respectively with the dual ones (4.3) and (4.4). 
First, recall some properties of the convex conjugate ip of tp. For the proofs, we can refer 
to Section 26 in Rockafellar (1970). The function ip is convex and closed, its domain is 
an interval with endpoints 

a* := lim b* := lim ^ (4.5) 

x— >— oo x x— >+oo x 

satisfying a* < < b* with ip(0) = 0. The strict convexity of ip on its domain (a, b) is 
equivalent to the condition that its conjugate ip is essentially smooth, i.e., differentiable 
with 

lim 4a . ip'(t) = -oo if a* is finite, 

(4.6) 

lim^b* ip'[t) = +oo if b* is finite. 
Conversely, ip is essentially smooth on its domain (a, b) if and only if ip is strictly convex 
on its domain (a*,b*). In all the sequel, we assume additionally that tp is essentially 
smooth. Hence, ip is strictly convex on its domain (a*, b*), and it holds that 



a* = lim(//(x), b* = lim ip'(x) 

x\.a x~\b 



and 



4j{t) = t V '~\t) - up (V" 1 ^)) , for all t e]a*, b% (4.7) 
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j-1 



where tp' denotes the inverse function of tp 1 . It holds also that ip is twice continuously 
differentiable on la* b*\ with 



ip'(t) = p'~\t) and ip"(t) 



(4.8) 



p"{p'-\t)Y 

In particular, ip'ifS) = 1 and ip"{fS) = 1. Obviously, since p is assumed to be closed, we 
have 

tp(a) = \imp(x) and tp(b) = lim<p(x), 

x\.a x^b 

which may be finite or infinite. Hence, by closedness of ip, we have 

= limf/^x) and ip{b*) = lim^(t). 

tia* t^b* 

Finally, the first and second derivatives of p in a and b are defined to be the limits of 
ip'(x) and p"(x) when x I a and when x t b. The first and second derivatives of ip in a* 
and b* are defined in a similar way. In Table 1, we give the convex conjugates ip of some 
standard functions tp, associated to some standard divergences. We determine also their 
domains, (a, 6) and (a*, 6*). 

Table 1. Convex conjugates for some standard divergences. 







domp 


dom-0 




Dl<L m 


(p(x) := — log x + x — 1 


]0,+oo[ 


]-oo,l[ 


^) = -log(l-0 


Dkl 


<p(x) := x log x — x + 1 


[0,+oo[ 


R 


^(t) = e * - 1 






]0,+oo[ 


]-oo,i 


^(t) = 1 - VI - 2t 




tp{x) :=\{x— l) 2 


R 


R 


^(t) = \t 2 + t 


D H 


<p{x) :=2(^-l) 2 


[0,+oo[ 


]-oo,2[ 


^(*) = £ 




¥>W — 7(7 _i) 









Proposition 4.1. Lei 9 be a given value in O. If there exists Qo in M.^ such that 

a < Qo(Xi) < 6, for all i = 1, . . . , n, (4.9) 
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then 

inf D v {Q,P n )= sup \t o --J2^(t° + Y,tj9j( X i,0))} ( 4 -!0) 

with dual attainment. Conversely, if there exists some dual optimal solution t := (to, t±, . . . , ti) T G 
R 1+l such that 

i 

a* < t + y2t j g j (X i ,9) < b*, for all i = l,...,n, (4.11) 

i=i 

then the equality (4-10) holds, and the unique optimal solution of the primal problem 
inf^^o) D ip (Q,P n ), namely the projection of P n on Mq, is given by 

1 ' 

where t := (t ,ti, . . . ,ti) T is solution of the system of equations 

-^Er=i^(^^)^ _1 (^+E5=i^(^^)) = °> ./ 1 



Remark 4.1. For the x 2 — divergence, we have a = — oo and b = +oo. Hence, condition 
(4.9) holds whenever Aig is not void. More generally, the above Proposition holds for 
any (^-divergence with domtp = M. 

Remark 4.2. Assume that g(x, 9) := (x — 6) T . So, for any divergence D v with dorwp = 
]0, +oo[, which is the case of the modified x 2 divergence and the modified Kullback-Leibler 
divergence (or equivalently EL method), condition (4.9) means that 9 is an interior point 
of the convex hull of the data (Xi, ...,X n ). This is precisely what is checked in Owen 
(1990), p. 100, for the EL method; see also Owen (2001). 



For the asymptotic counterpart of the above results we have; see Theorem 1 in Broniatowski and Keziou 
(2006): 
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Proposition 4.2. Let 9 be a given value in G. Assume that J \gj(x,9)\ dPo(x) < oo, for 
all j — 1, ... , /. If there exists Qo in Ai$ with D^Qo, Pq) < oo and 2 

a< inf — — (x) < sup ^^-(x) < b, P — a.s., (4.12) 

zeR™ dP ' xeR™ a-Po 

then 

inf ZyQ,P )= sup (t - / ^(to + yWM^dPoCaOl ( 4 - 13 ) 

rnt/i duo/ attainment. Conversely, if there exists some dual optimal solution t* which is 
an interior point of the set 

t G R 1+l such that / \if)(t + ^2t j g j (x, 9))\ dP (x) < oo > , (4.14) 

JWn j = 1 J J 

then the dual equality (4-13) holds, and the unique optimal solution Q* e of the primal 
problem infQ € _A4 e D 9 (Q, Pq) , namely the projection of Pq on M.q, is given by 



where t* := (tg, t*, . . . , t*) is solution of the system of equations 



i - jy- 1 ^ + £i =1 *^M)) ^Po(^) = o, 

-fg j (x,ey-\t* Q + Y,\ =1 r j g j (x,e))dP (x) = o, j = i,...,z. 



(4.15) 



Furthermore, t* is unique if the functions g±(., 9), . . . , gi(., 9) are linearly independent 
in the sense that P |x G M. m \ t + £)J. =1 tjg^x, 9) ^ oj > for all t G M m t ^ 0. 



For sake of brevity and clearness, we must introduce some additional notations. In all 
the sequel, ||x|| denotes the norm of x defined by ||x|| := supj \xi\ for any vector x := 
(xi, . . . , Xk) T G M fc , and for any matrix A, the norm of A is defined by ||A|| := supj a \a>i,j\- 

2 The strict inequalities (4.12) mean that P Q \x £ M. m \ ^r-(x) < a] = P \x £ W n \ ^-(x) > b\ = 0. 
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Denote by ~g the vector valued function ~g := (Ir™, g±, . . . , gi) T G M. 1+l . For any p.m. P on 
(M m ,i3(M m )) and any real measurable function / from (R m ,B(R m )) to (R,B(R)), denote 



Pf := / f(x) dP{x) 



Let 

l 

t T g(x,9) :=t + Y,tjgj{x,6) 

3=1 

and 

m(x, 9, t) := t - tp(t T g(x, 0)), for all x G R m , 9 G 6 C R d , t G K 1+i . (4.16) 
Note that the sup in (4.10) and (4.13) can be restricted, respectively, to the sets 

Aj n) : = [t g R 1+l | a* < t T g(X t , 9) < b\ for aU i = 1, . . . , n) (4.17) 

and 

A e := ft G R 1+l | / |^(i + Y\tj9j{x,0))\ dP (x) < oo 1 . (4.18) 
I m j=i J 

In view of the above two Propositions 4.1 and 4.2, we redefine the estimates (3.5), (3.6) 
and (3.7) as follows 

1 n 

D^(Me,P ):= sup - J] mpQ, 0, t) := sup P n m(9,t), (4.19) 



1 " 

D V (M,P ) := inf sup - Vm(X;,0,t) : = inf sup P n m(9,t) (4.20) 

and 

1 n 

0^ := arg inf sup — m(Xi, 0, t) := arg inf sup P n m(9,t). (4-21) 



Remark 4.3. When y?(a;) = — logx + x — 1, then the estimate (3.7) clearly coincides with 
the EL one, so it can be seen as the value of the parameter which minimizes the KL m - 
divergence between the model Ai and the empirical measure P n of the data X\, . . . , X n . 
The statistic 2nD KLm (A4, P ), see (3.6), coincides with the empirical likelihood ratio 



n 



1 - *o - ^Yl%g{Xi,9) J , i = l,...,n, 

3=1 
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statistic associated to the null hypothesis Ho : Po E A4 against the alternative "Hi : P 
M. The dual representation of D KLm (M,P ), see (4.20) and (4.10), is 

D KLm (M,Po) = inf sup (i + -f>g(l ~ *o - X^PM))j ■ 

teA< n) L n i=i j=i J 

For a given # G 6, the i^L m -projection Qg n ' ) , of P n on A^g, is given by (see Proposition 
4.1) 

1 

which, multiplying by Q e n (Xi) and summing upon % — 1, . . . , n, yields to = 0. Therefore, 
to can be omitted, and the above representation can be rewritten as follows 

D KLm (M,P ) = m£ sup j-]Tlog(l + X^(X^))) 

KLm = EL = arg inf sup i-^log(l + ^9^,9))} (4.22) 

k 1=1 J=l J 

in which the sup is taken over the set 

(t u . . .,ti) T G R m | - 1 < J^ij0j(Xi,0) < +oo, for all % = 1, .. . , 



and then 



, n 

3=1 



The formula (4.22) is the ordinary dual representation of the EL estimate; see Qin and Lawless 
(1994) and Owen (2001). 



Remark 4.4. Consider the power divergences, associated to the power functions <£> 7 ; see 
(2.3) and (2.4). We will show that the estimates 9 lfi belong to the class of GEL estimators 
introduced by Smith (1997) and Newey and Smith (2004). The projection of P n on 
M.Q is given by 

/ i \ 

Q ( ;\x t )= ^-l)(t + Yj jg (Xi,9)) + lj , i = l,...,n. 
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Using the constraint Y^i=i Qe^i^i) = 1> we can explicit to in terms of ti,...,ti, and 
hence the sup in the dual representation (4.21) can be reduced to a subset of Mf, as in 
Newey and Smith (2004). When (p(x) = \{x — l) 2 , it is straightforward to see that the cor- 
responding estimate 6^ coincides with the continuous updating estimator of Hansen et al. 
(1996). 

Remark 4.5. (Numerical calculation of the estimates and the specific role of 
the x 2 -divergence). The computation of t{9) for fixed 6 £ G as defined in (4.15) is 
difficult when handling a generic divergence. In the particular case of x 2 -divergence, i.e., 
when <p(x) = — l) 2 , optimizing on all s.f.m's, the system (4.15) is linear; we thus easily 
obtain an explicit form for t(9), which in turn allows for a single gradient descent when 
optimizing upon 0. This procedure is useful in order to compute the estimates for all 
other divergences (for which the corresponding system is non linear) including EL, since 
it provides an easy starting point for the resulting double gradient descent. Moreover, 
Hjort et al. (2009) extend the EL approach, to more general moment condition models, 
allowing the number of constraints to increase with growing sample size. In this case, the 
computation of EL estimate is more complex, and the same idea as above can help to 
solve the problem. 

5. Asymptotic properties of the estimates of the parameter and the 

divergences 

5.1. Asymptotic properties under the model. This Section addresses Problems 1 
and 2, aiming at testing the null hypothesis "Ho '■ Pq & M- against the alternative Hi : 
Pq M.. We derive the limiting distributions of the proposed test statistics which are 
the estimated divergences between the model A4 and Pq. We also derive the limiting 
distributions of the estimates of 9q. The following two Theorems 5.1 and 5.2 extend 
Theorems 3.1 and 3.2 in Newey and Smith (2004) to the context of divergence based 
approach. The Assumptions which we will consider match those of Theorems 3.1 and 3.2 
in Newey and Smith (2004). 
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Assumption 1. a) P$ G M. and 9$ E Q is the unique solution of K[g(X, 6)} = 0; b) 
C M. d is compact; c) g(X, 6) is continuous at each 9 G with probability one; d) 
E[sup eee \\g(X, 6)\\ a ] < oo for some a > 2; e) the matrix Q := E [g(X, 6 )g(X, 8 ) T ] is 
nonsingular. 

Theorem 5.1. Under Assumption 1, with probability approaching one as n — >• oo, the 
estimate 6^ exists, and converges to 6 in probability. - J^ILi 9(Xh @>p) = Op(l/y/n), 
t(6 ip ) := argsup .(«) P n m(6 ip ,t) exists and belongs to int(h^ ) with probability approach- 

ing one as n — >■ oo, and = Op{\/ y/n). 

In order to obtain asymptotic normality, we need some additional Assumptions. Denote 
by G the matrix G := E [<9#(X, o )/00]. 

Assumption 2. a) # G int(G); b) with probability one, g(X,9) is continuously differen- 
tiable in a neighborhood N$ of $o, and E sup egA r eo ||<9c/(X, 6)/d6\\ < oo; c) rank(G) = d. 

Theorem 5.2. Assume that Assumptions 1 and 2 hold. Then, 

1) \fn \ Q<p — QqJ converges in distribution to a centered normal random vector with 
covariance matrix 

V := [GQ^G^ 1 . 

2) If I > d, then the statistic 2nD (p (A / l, Pq) converges in distribution to a \ 2 random 
variable with (/ — d) degrees of freedom. 

Remark 5.1. The above Theorem allows to perform statistical tests (of the model) with 
asymptotic level a g]0, 1[. Consider the null hypothesis 

Ho : P G M against the alternative Hi : P M. (5.1) 

The critical region is then 

C v := {2nD v (M,P ) > ?(!-«)} 
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where q(i- a ) is the (1 — a)-quantile of the x 2 (^ — d) distribution. When ip(x) = — logx + 
x — 1, it is straightforward to see that the corresponding test is the empirical likelihood 
ratio one; see Qin and Lawless (1994). 

5.2. Asymptotic properties of the estimates of the divergences for a given value 
of the parameter. For a given 9 £ 0, consider the test problem of the null hypothesis 
"Ho : Pq £ M-e against two different families of alternative hypotheses: "Hi : Pq ^ Aie and 
T-L'-y : Pq £ Ai \ M.q. Those two tests address different situations since Hi may include 
misspecification of the model. We give two different test statistics each pertaining to 
one of the situations and derive their limiting distributions both under Hq and under 
the alternatives. As a by product, we also derive confidence areas for the true value 
#o of the parameter. We will first state the convergence in probability of D ip (Aie, Pq) 
to D^Aie, Pq), and then we obtain the limiting distribution of D v (Aie, Pa) both when 
Po £ Aie and when P Q Aie- Obviously, when P £ Aie, this means that 9 = 8 since 
the true value 9 of the parameter is assumed to be unique. 

Assumption 3. a) P £ M e and 9 is the unique solution of E [g(X, 9)} = 0; b) E [\\g(X, 9)\\ a ] < 
oo for some a > 2; c) the matrix f2 := E [g(X, 9)g(X, 9) T ~\ is nonsingular. 

Theorem 5.3. Under Assumption 3, we have 

1) t(9) := argsup .(«) P n m(9,t) exists and belongs to int(A^) with probability ap- 
proaching one as n —¥ oo, and t{9) = Op(l/^/n). 

2) The statistic 2nD tf {M.Q, P Q ) converges in distribution to a x 2 (0 random variable. 

In order to obtain the limiting distribution of the test statistic 2nD v (Aie, Pq) under the 
alternative Tii : Pq ^ Aie, including misspecification, the following Assumption is needed. 

Assumption 4. a) P £" Aie, and t*{9) := argsup teAe E [m(X, 9, t)} exists and is an interior 
point of Ag; b) E sup ie7Vt , (e) \m(X, 9,t)\ < oo for some compact set N t *^ C such that 
t*(9) £ int(iV t *(0)); c) the functions Irto, gi, . . . , g\ are linearly independent in the sense 
that : P jx £ R m \ t + Y! j= i tj9j( x , ^) ^ o} > for alU £ R 1+l with t^O. 
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Remark 5.2. Assumption 4.c above ensures the strict concavity of the function t G Ag \-> 
E [m (X, 9, t)} on the convex set Ag, which implies that t*(9) is unique. It can be replaced 
by the following Assumption : there exists a neighborhood, Nft$) C Ag, of t*(9), such 



that E 



sup teJVt \\dm(X,9,t)/dt\\ < oo, E su PteNt \\d 2 m(X,9,t)/dt 



< oo and 



the matrix E [d 2 m(X, 9, t*(8))/dt 2 } is nonsingular; which implies also that t*(8) is unique. 

Theorem 5.4. Under Assumption 4, when P ^ Aig, we have 

1) t{9) converges in probability to t*{9). 

2) D^Aig, Pq) converges in probability to D^Aig, Pq). 

We now give the limiting distribution of the test statistic under Hi. We need the following 
additional condition. 

Assumption 5. a) There exists N t *{p) C Ag, some compact neighborhood of t*(9), such 
that 

E[ sup \\dm(X,6,t*(6))/dt\\] < oo, E[sup \\d 2 m(X,6,t*(6))/dt 2 \\] < oo; 

teN t * w teN t * m 

b) as 5 — > 0, 

E < sup \\d 2 m(X,9,t)/dt 2 -d 2 m{X,6,t*{6))/dt 2 \\ \ 0; 

[{t;\\t-t*(6)\\<6} J 

c) E [m(X, 9, t*(6)) 2 } < oo, E [\\dm(X, 6, t* (6)) / dt\\ 2 ] < oo 
and the matrix E [d 2 m(X, 9, t*(9))/dt 2 } is nonsingular. 

Remark 5.3. Assumption 5.b is used here to relax the condition on the third derivatives 
(in t) of the function t i— > m(X, 9, t) . 

Theorem 5.5. Under Assumptions 4 o,nd 5, we have 

1) y/n(t{6) —t*{9)) converges in distribution to a centered normal random vector with 
covariance matrix 

[E [m"(X, 9, T)]]" 1 E [m'(X, 9, t*)m'{X, 9, t*) T ] [E [m"{X, 9, t*)]]' 1 . 
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2) y/n\D ip (M.Q,Po) — D ip (Me, Po)j converges in distribution to a centered normal 
random variable with variance 

a 2 (6) := E [m{X, 6, t*{6)) 2 } - [E [m(X, 6, t*{6))]] 2 . 

Remark 5.4. Let 9 be a given value in 0. Consider the test of the null hypothesis 

K :F e Me against Ui : P £ Mg. (5.2) 

In view of Theorem 5.3 part 2, we reject "Ho against Hi, at asymptotic level a e]0, 1[, 
when 2nD ip (Mg, Pq) exceeds the (1 — a)- quantile of the x 2 (0 distribution. Theorem 5.5 
part 2 is useful to give an approximation to the power function 



P i Mg^ P(P ) := Po 
We obtain then the following approximation 



2nD v (Mg,P ) > 



0(Po) 



( 



11 



qi- 



D v (M e ,P 



(5.3) 



\a(6) L 2n 

where Ftf is the cumulative distribution function of the standard normal distribution. 
From this approximation, we can give the approximate sample size that ensures a desired 
power j3 for a given alternative P £ Mg. Let n be the positive root of the equation 

fq(i-a) 



i.e., 



P = i-Fn 



n 



- {Me, P 



a (6) V 2n 
(a + b) - y/a (a + 2b) 



2D V (Mg,P Q y 

with a := a(6*) 2 [F^ 1 (1 — /3)] 2 and b := q^^D^ (Mg, Po) . The required sample size is 
then \n \ + 1) where |_^oJ denotes the integer part of n Q . 



Remark 5.5. (Generalized empirical likelihood ratio test). For testing "H : -Pq £ 
Mg against the alternative H! x : M \ Mg, we propose to use the statistics 



2nS* 



2n 



Dy {Me, P ) - inf D v (Mg, P ) 



(5.4) 
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which converge in distribution to a x 2 {d) random variable under Hq when Assumptions 
1 and 2 hold. This can be proved using similar arguments as in Theorems 5.2 and 5.3. 
We then reject %q at asymptotic level a when 2nS% > q(i- a ), the (1 — a)-quantile of the 
X 2 ((i)-distribution. Under r H! x and when Assumptions 1,2,4 and 5 hold, as in Theorem 
5.5, it can be proved that 

y/E(S%-D v (M e ,P )) (5.5) 
converges to a centered normal random variable with variance 

o 2 (6) : = E (m(X, 6, t*(6)) 2 ) - (Em(X, 6, t*{8))) 2 . 

So, as in the above Remark, we obtain the following approximation 

P(Po) « 1 - F N h^- - D v (M e , Po)] ) (5.6) 

to the power function P G M/M e ^ p(P ) := P [2nS% > • The approximated 

sample size required to achieve a desired power for a given alternative can be obtained in 
a similar way. 

Remark 5.6. (Confidence region for the parameter). For a fixed level a e]0, 1[, 
using convergence (5.4), the set 

{6 G 6 such that 2nS% < q(i- a )} 

is an asymptotic confidence region for 9 where qa-. a ) is the (1 — a)-quantile of the x 2 (d)- 
distribution. It is straightforward to see that the confidence region obtained for the 
i^L m -divergence coincides with that of Owen (1991) and Qin and Lawless (1994). 

5.3. Asymptotic properties under misspecification. We address Problem 1 stating 
the limiting distribution of the proposed test statistics under the alternative %i : Pq ^ M.. 
This needs the introduction of Qg*, the projection of Pq on M.. Assumption 6 below 
ensures the existence of the "pseudo-true" value 8* as well as the existence of the projection 
Qg* of Pq on A4, and states some necessary other regularity conditions. Proposition 4.2 
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above states the existence and characterization of the projection Q* B of Pq on Aig, for a 
given 9 G 0. 

Assumption 6. a) is compact, 9* := arginf^ge sup tgAfl E [m(X, 9, t)} exists and is unique; 

b) g(X, 9) is continuous at each 9 G with probability one; 

c) E sup{0 e6jte jv j \m(X,6,t)\ < oo, where Nt*(g) C A« is a compact set such that 
t*(9) G int (iV t *(0)); d) for all 9 G 0, the functions Ir™, <7i, . . . ,gi are linearly independent 
in the sense that P {x G M m | t + E$=i ^ oj > 0, for all t G with i ^ 0. 

Remark 5.7. Assumption 6.d ensures the strict concavity of the function t G A# i— > 
E [m(X, on £/ie convex set Ag, which implies the uniqueness oft*(9), for all 9 G 0. 
27ms Assumption can be replaced by the following one : for all 9 G 0, there exists a 
neighborhood N t *^ oft* (9) such that 

E[ sup \\dm(X,6,t)/dt\\]<oo t E[ sup \\d 2 m(X, 9, t)/dt 2 \\] < oo 

teiv t * w teJV t * (e) 

and the matrix E [<9 2 m(X, 9, t*(9))/dt 2 ] < oo is nonsingular, which implies the uniqueness 
oft* (9), for cdieeQ. 

Theorem 5.6. Under Assumption 6, we have 

1) P(^) — ^*(^)ll converges in probability to uniformly in 9 G 0. 

2) # v converges in probability to 9* ; 

3) -D v (.M,Po) converges in probability to D^Ai, P ). 

The asymptotic normality of the test statistics under misspecification requires the follow- 
ing additional conditions. 

Assumption 7. a) 9* G int(O); b) there exists M C x Ae, some compact neighborhood 
of (9*,t*(9*)), such that with probability one (9,t) 6^4 m(X,6,t) is C 2 and 



E[ sup ||am(X,0,t)/9(0,t)||] < oo, E[ sup \\d 2 m(X, 9, t)/d(6, t) 2 \\] < oo; 
(0,t)e./V (9,t)eJ\f 
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c) as S — > 0, 
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S :-- 



E<! sup \\d 2 m(X,9,t)/d(9,t) 2 - d 2 m{X,9\t*{9*))/ d{9,t) 2 \\ \ 0; 

{{t,ey,\\(t,e)-(t*(e*),6*)\\<5} 

d) E[m(X,9*,t*(9*)) 2 ] , E[\\dm(X,9%t*(9*))/dt\\ 2 ] and E [\\dm(X,9*,t*(9*)/d9\\ 2 ] are 
finite, and the matrix 

Sn S12 

S21 S22 

is nonsingular, where Sn := E [d 2 m(X, 9*, t*(9*))/dt 2 ], 

S 12 = S 21 T := E [d 2 m(X, 9*,t*(9*))/dtd9] and S 22 := E [<9 2 m(X, 0*, t*(9*))/d9 2 } . 

Remark 5.8. Assumption 7.c is used here to relax the condition on the third derivatives 
(in t and 9) of the function (9, t) h> m(X, 9, t) . 

Theorem 5.7. Under Assumptions 6 and 7, we have 
1) 

e v -e* 



n 



converges in distribution to a centered normal random vector with covariance ma- 
trix 

W := S^MS- 1 



where 



M := E 



§m(X,9*,t*(9*)) 
£ 6 m(X,9*,t*(9*)) 



f t m(X,9*,t*(9*)) 
£ e m(X,9*,t*(9*)) 



2) ^/n yDp^M., Pq) — D v> (Ai, Pq) j converges in distribution to a centered normal ra 
dom variable with variance 



a 2 (9*) := E [m(X, 9*, t*{9*)) 2 ] - [E [m(X, 9*, t*{9*))]] 2 
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Remark 5.9. In the case of EL, i.e., when <p(x) = — \ogx + x — 1, Assumption 6.c implies 
that 

-oo< inf t + t T g(x,6) < sup t + t T g(x, 9) < 1 (5.7) 

for all x G M. m — Pq-sl.s., for all 9 G O and for all t G N t *my This imposes a restriction 
on the model when the support of Pq and the functions gj are unbounded. Indeed, when 
the support of Pq is for example the whole space M m , the condition above does not hold 
when g is unbounded. In this case, the EL estimate may cease to be consistent as it is 
stated by Schennach (2007) under misspecification. This is a potential problem for all 
divergences associated to (^-functions with domain of the form (a, +oo[, ] — oo, b) or (a, b), 
where a and b are some finite real numbers; it is the case of modified x 2 > Hellinger, KL 
and modified KL divergences. At the contrary, Assumption 6.c may be satisfied for other 
divergences associated to ip functions with domtp = M which is the case of \ 2 divergence 
for example. 

Remark 5.10. Theorem 5.7 part 2 is useful for the computation of the power function. 
For testing the null hypothesis %q : P G M. against the alternative ~K\ : P Ai, the 
power function is 



P i M m- P(P ) := P 2n£>„ (M, P ) > g ( i_ a) 



(5.1 



Using Theorem 5.7 part 2, we obtain the following approximation to the power function 
(5.8): 



n /<?(l-a) 



D v (M,P 



(5.9) 



a{V*) v 2n 

where is the empirical cumulative distribution of the standard normal distribution. 
From the proxy value of P(Pq) hereabove, the approximate sample size that ensures a 
given power /3 for a given alternative Pq G" A4. can be obtained as follows. Let n be the 
positive root of the equation 







0- 



2n 



D v (M,P 
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i.e., 



(a + b) - J a (a + 2b) 

no = o > 

2D V (M,P ) 2 

where a := a{9*) 2 [F^- 1 (1 — /?)] 2 and b := q^^D^ P ) . The required sample size is 
then |_ri J + 1- 

6. Simulation results: Approximation of the power function of the 

empirical likelihood ratio test 

We will illustrate by simulation the accuracy of the power approximation (5.9) in the 
case of EL method, i.e., when (p(x) = — logx + x — 1. Consider the test problem of the 
composite null hypothesis 

%q : Pq E M. against the alternative Hi : Pq A4, 

where M. := IJeeM anc ^ ^ s ^ ne se ^ °^ a ^ s-f- 111 ' 5 satisfying the constraints f dQ(x) = 
1 and J g(x,6) dQ(x) = with g(x,6) := (x,x 2 — 9) T , namely 

M e := \ Q e M such that / dQ(x) = 1 and / g(x, 6) dQ(x) = I , 

where 9 G K is the parameter of interest. We consider the asymptotic level a = 0.05 
and the alternatives Pq :— U(\— 1, 1 + e]) ^ M. for different values of e in the interval 
]0, 1]. Note that when e = then the uniform distribution IA{\— 1, 1]) belongs to the 
model M.. For this model, we can show also that all Assumptions of Theorem 5.2 are 
satisfied when e = 0, and all Assumptions of Theorem 5.7 are met under alternatives. In 
Figure 1, the power function (5.8) is plotted (with a continuous line), with sample sizes 
n = 50, n = 100, n = 200 and n = 500, for different values of e. Each power entry was 
obtained by Monte-Carlo from 1000 independent runs. The approximation (5.9) is plotted 
(with a dashed line) as a function of e. The estimates 9 V and D ip (Ai, P ) are calculated 
using the Newton-Raphson algorithm. We observe from Figure 1 that the approximation 
is accurate even for moderate sample sizes. 
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FIGURE 1. Approximation of the power function 
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7. Concluding remarks and possible developments 

We have proposed new estimates and tests for model satisfying linear constraints with 
unknown parameter through divergence based methods which generalize the EL approach. 
This leads to the obtaining of the limiting distributions of the test statistics and the 
estimates under alternatives and under misspecification. Consistency of the test statistics 
under the alternatives is the starting point for the study of the optimality of the tests 
through Bahadur approach; also the generalized Neyman- Pearson optimality of EL test (as 
developed by Kitamura (2001)) can be adapted for empirical divergence based methods. 
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Many problems remain to be studied in the future such as the choice of the divergence 
which leads to an optimal (in some sense) estimator or test in terms of efficiency and/or 
robustness. Preliminary simulation results show that Hellinger divergence enjoys good 
properties in terms of efficiency- robustness; see Broniatowski and Keziou (2008). Also 
comparisons under local alternatives should be developed. 

8. Appendix 

Proof of Theorem 5.1. 

The same arguments, used for the proof of Theorem 3.1 in Newey and Smith (2004), hold 
when their criterion function (9, A) £ 8 x I 1 4 - Y^i=i P(-^ T 9{X, 9)) is replaced by our 
function (9, i) G - Y^7=i m (t T 0))- I n particular, we have 

t(9 v ) T g(xJ v 



max 

Ki<n 







in probability, which implies that t[9 v ) G int(A~ n ' ) ) with probability one as n — > oo, since 
a* < < b*. 



Proof of Theorem 5.2. 

The proof is similar to that of Newey and Smith (2004) Theorem 3.2. Hence, it is omitted. 
Proof of Theorem 5.3. 

It is a particular case of Theorem 5.1 taking = {9}. Hence, the proof is omitted. 
Proof of Theorem 5.4. 

1) First, note that t*(9) exists and is unique by Assumption 4. By the uniform weal law 
of large numbers (UWLLN), using continuity of m(X,9,t) in t, and Assumption 4.b, we 
obtain 

\P n m(0,t)-E[m(X,8,t)]\->0, (8.1) 

in probability uniformly in t over the compact set N t *m\. Using this and the fact that 
t*{9) := argsup tgA() Pom(9,t) is unique and belongs to mt(N t *(e)) and the strict concavity 
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of 1 1 — y Pom(9,t), we conclude that any value 

t:=arg sup P n m(9,t) (8.2) 

teN t * {g) 

converges in probability to t*(9); see e.g. Theorem 5.7 in van der Vaart (1998). We end 
then the proof by showing that t(9) belongs to mt(N t *^) with probability one as n — > oo, 
and therefore it converges to t*{9). In fact, since for n sufficiently large any value t lies 
in the interior of N t *(e), concavity of t i— >■ P n m(6,t) implies that no other point t in the 
complement of int(N t */g\) can maximize P n m(8,t) over t G M 1+ ', hence t(9) must belongs 
to int(iV t »(0)). 

2) With probability tending to 1 as n — > oo, we have D^{M.e, Pq) = P n m(6, t) = P n m(8, t). 
Hence, we can write 

p v (Mg,P )-D v (Mo,P )\ = \P n m(6,t)-P o m(0,t*(6))\ =: \A\, 

and 

P n m{0,t*{e)) - P m(9,t*(6)) < A < P n m(9,t) - P m(9,t). 
Both the RHS and the LHS in the above display tend to in probability by (8.1). Hence, 



DJM 9 ,P )-DJMe,P 



tends to in probability as n — > oo. This ends the proof. 



Proof of Theorem 5.5. 

1) For n sufficiently large, by a Taylor expansion, there exists t e M. 1+l inside the segment 
that links t and t*(6) with 

= P n m'(9,t) 

1 ' ; T (8.3) 

= P n m'(9,t*(0)) + {P n m"(9,t)Y (t-t*(9)) . 

By Assumptions 5. a and 5.b, using the fact that t = t*{9) + op(l) and the UWLLN, we 
can prove that 

P n m"{9~t) = P m"(9,t*(9)) + o P (l). 
Using this display, one gets from (8.3) 

-P n m'(9,t*(9)) = (P m"(9,t*(9)) + o P (l)) (t-t*(9)) . (8.4) 



30 MICHEL BRONIATOWSKI 1 AND AMOR KEZIOU 12 

Assumptions 4. a and 5. a imply that P m'(9,t*(9)) = 0. Hence, by the central limit 
theorem (CLT), we have 

which by (8.4) implies that yfn (t — t*(9)) = P (1). Hence, from (8.4), we get 

y/E(t-t*(9)) = l-P o m"(6,t*(6))r 1 V^Pnm'(e,t*(0)) + o P (l). (8.5) 
The CL and Slutsky theorems conclude the proof of part 1. 

2) Using the fact that (t-t*(9)) = P (l/y/n) and P n m'(9,t*(9)) = P m'(9,t*(9)) + 
op(l) = + op(l) = op(l), we obtain 

V^(D v (M e ,P )-D ip (Me,Po)) = (P n m(9,?) - P m(9,t*(9))) 

= yfti(P n m(9,t*(0)) ~ P m(9,t*(9))) + o P (l), 

and the CL and Slutsky theorems conclude the proof. 
Proof of Theorem 5.6. 

1) First note that Assumption 6.d implies that the function t e A$ h- > Km(X,9,t) is 
strictly concave for all 9 G 0, which implies that t*(9) is unique for all G G. By the 
UWLLN, using continuity of m(X, 9, t),in9 and t, and Assumption 6.c, we obtain the uni- 
form convergence in probability, over the compact set {(0,t) G 6 x R 1+l ; 9 G 6,t G N t * {e) }, 

sup \P n m(9,t) -P m(9,t)\ 0. (8.6) 

{eee,teN t * w } 

We can then prove the convergence in probability sup^Q \\t(9) — 1*(9)\\ — > in two steps. 
Step 1: Let r] > 0. We will show that P [sup e60 \\t(9) - t*(9)\\ > rj] -> for any value 

t(#):=arg sup P n m(9,t). (8.7) 

teN t * m 

Step 2: To conclude the proof, we will show that t{9) belongs to int(iV t *(0)) with probability 
one as n — > oo for all 9 G 0. Let rj > such that sup e60 \\i(9) — t*{9)\\ > t]. Sine G 
is a compact set, by continuity there exists 9 G such that sup eg @ \\t(9) — t*(9)\\ = 
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p(0) - t*(9)\\ > rj. Hence, there exists e > such that P Q m(5,t*(5)) - P Q m(6,t(6)) > e. 
In fact, e may be defined as follows 

e:=inf sup R[m(X, 9, t*(9))] - E[m(X, 9, t)], 

9e0 {teiv t » {e) ; \\t-r(0) \\>v} 

which is strictly positive by the strict concavity of K[m(X,9,t)} in t for all 9 G 0, 
the uniqueness of t*(9) G mt(N t *^) and the fact that 6 is compact. Hence the event 
[supggg, \\i(9) — t*(9)\\ > 77] implies the event 

[P Q m(9,t*(9))-P m(9,t(9))>e], 

from which we obtain 



Po 



sup \\t(9) -t* (9)\\ > r] 



_6»G0 

On the other hand, by (8.6), we have 



< P [P o m(0,t*(9)) - P Q m(6,i(6)) > e] 



P m(8,t*(8)) - P m(9,t{9)) = P n m(9,t*{9)) - P m(9,t{9)) + o P (l) 

< P n m(9,t(9)) - P m(9,t(9)) + o P (l) 

< sup \P n m(9,t) -P m(9,t)\ +o P (l). 

{eee,t£N t , w } 

Combining this with (8.8) and (8.6), we conclude that 

sup p(0) -t* {9)\\ ->■ (8.9) 

eee 

in probability. In particular, t(9) G mt(N t *^) for sufficiently large n, uniformly in 9 G 
G. Since t 1— > P n m(9,t) is concave, then the maximizer t{9) belongs to int(A^*(m) for 
sufficiently large n; hence the same result (8.9) holds when t{9) is replaced by t{9). 
2) From part 1, we have for large n, 

sup \P n m{9,t{9)) - P m{9,t*{9)) \ = sup \P n m{9,t(9)) - P m(9,t* (9))\ =: sup|B|. 
eee 6»ee eee 

On the other hand, we have 

P n m(9,t*(9)) - P m(9,t*(9)) <B< P n m(9,t(9)) - P m(9,t(9)). 
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By Assumption 6.c, and the convergence in probability sup 06e \\t(9) — t*(9)\\ — > 0, both 
the RHS and LHS of the above display tends to in probability uniformly in 9 G 0, 
by the UWLLN. Hence, sup ege \P n m(9,t(9)) - P m(9,t*(9))\ -> in probability. Now, 
since the minimizer 9* of 9 t— > Pom(9, t*(9)) over the compact set is unique and interior 
point of 0, by continuity and the above uniform convergence, we conclude that 9^ tends 
in probability to 9*; see e.g. Theorem 5.7 in van der Vaart (1998). 
3) This holds as a consequence of the uniform convergence in probability 

sup\P n m{9,t(9)) - P m(9,t*(9))\ ^0 (8.10) 
6»ee 

proved in part 2 above. In fact, we have for n sufficiently large 

\D ip (M,P )-D (p (M,P Q )\ = \P n m(9,t(9))-P Q m(9*,t*(9*))\ =: |C|, 

with 

P n m(9,t(9)) - P m(9,t*(9)) < C < P n m(9* ,t(9*)) - P m(9* ,t*(9*)) 
and both the RHS and LHS tend to in probability by (8.10). This concludes the proof. 

Proof of Theorem 5.7. 

1) By the first order conditions, with probability tending to one, we have 

P n ^m(9,t{9)) = 

P n ^m fot{6j) +P n §im (9,t(9)) wW) = 0. 

The second term in the LHS of the second equation is equal to 0, due to the first equation. 
Hence, t(9) and 9 are solutions of the somehow simpler system 
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Using a Taylor expansion in (8.11) in (6,t) around (#*,£*); there exists inside the 

segment that links (Q,t(Q)) and (0*, £*(#*)) such that 



o = p n !L m {o*,t\o*)) + 



d 2 - _ 



' 1 n d6dt 



m{t>, c 



with 



a n . - 



t(9)-t*(9*)\ , 



(8.13) 



(8.14) 



By Assumption 7, using the UWLLN, we can write 



Po^m(9*, t*(9*)), Po-^rn(9*, t*(9*)) 



Ml), 



to obtain from (8.13) 



P n ^m(9*,t*) 



o P (l) 



(8.15) 



In the same way, using a Taylor expansion in (8.12), we obtain 



d_ 

"89' 



-P n —m(9*,f 



o P (l) 



From (8.15) and (8.16), we get 

/ 

\/na n = y/n 

\ 



P ^ m (9*,t*) (Po&m(*V) 



P gsm(9*,t* 



\ 1 
/ 



x 



a n . 
(8.16) 



-P n fm(0*,t*) 
-P n i*m(9\t*) 



op(1). 



(8.17) 



Denote S the (1 + 1 + d) x (1 + / + d)— matrix defined by 



S 



On S12 
S2I S22 



P ^ m (9*,t* 



P ^m(9*,t*) 



/ 
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Hence, we obtain 

^ / W-rV) ) = ^ ( -PntmVX) \ 

\ g-e- ) \ -p„§m(e*,r) J 

and the CL and Slutsky theorems conclude the proof. 
2) Using the fact that 

t(6) - t*(9*) = P {l/Vn), P n dm(9*, t*(9*))/dt = P dm{9*,t*{9*))/dt + o P (l) = o P (l) 
and 

9-9* = P {\/y/n), P n dm{9*,t*{9*))/d9 = P dm(9*,t*(9*))/d9 + o P (l) = o P (l), 
we can write 

y^(D v (M,P )-D lp (M,P )) = yfa(P n m{9,t{9))-P Q m(9\t*{9*))) 

= (P n m{6*, t*{9*)) - P m(9*, t*{9*))) + o P (l), 

and the CL and Slutsky theorems end the proof. 

References 

Baggerly, K. A. (1998). Empirical likelihood as a goodness-of-fit measure. Biometrika, 
85(3), 535-547. 

Bertail, P. (2006). Empirical likelihood in some semiparametric models. Bernoulli, 12(2), 
299-331. 

Broniatowski, M. and Keziou, A. (2006). Minimization of (^-divergences on sets of signed 
measures. Studia Sci. Math. Hungar.; arXiv: 1003. 5457, 43(4), 403-442. 

Broniatowski, M. and Keziou, A. (2008). Estimation and tests for models satisfying linear 
constraints with unknown parameter. arXiv:0811.3477vl. 

Broniatowski, M. and Keziou, A. (2009). Parametric estimation and tests through diver- 
gences and the duality technique. J. Multivariate Anal, 100(1), 16-36. 



ESTIMATION AND TEST UNDER MOMENT CONDITION MODELS 35 

Chen, X., Hong, H., and Shum, M. (2007). Nonparametric likihood ratio model selection 
tests between parametric likelihood and moment condition models. J. Econometrics, 
141(1), 109-140. 

Corcoran, S. (1998). Bertlett adjustement of empirical discrepancy statistics. Biometrika, 
85, 967-972. 

Cressie, N. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. 

Soc. Ser. B, 46(3), 440-464. 
Csiszar, I. (1963). Eine informationstheoretische Ungleichung und ihre Anwendung auf 

den Beweis der Ergodizitat von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutato 

Int. Kdzl, 8, 85-108. 

Csiszar, I. (1967). On topology properties of /-divergences. Studia Sci. Math. Hungar., 
2, 329-339. 

Haberman, S. J. (1984). Adjustment by minimum discriminant information. Ann. Statist., 
12(3), 971-988. 

Hansen, L., Heaton, J., and Yaron, A. (1996). Finite-sample properties of some alternative 
gmm estimators. Journal of Business and Economic Statistics, 14, 462-2800. 

Hansen, L. P. (1982). Large sample properties of generalized method of moments estima- 
tors. Econometrica, 50(4), 1029-1054. 

Hjort, N. L., McKeague, I. W., and Van Keilegom, I. (2009). Extending the scope of 
empirical likelihood. Ann. Statist., 37(3), 1079-1111. 

Imbens, G. W. (1997). One-step estimators for over-identified generalized method of 
moments models. Rev. Econom. Stud., 64(3), 359-383. 

Keziou, A. (2003). Dual representation of ^-divergences and applications. C. R. Math. 
Acad. Set. Pans, 336(10), 857-862. 

Kitamura, Y. (2001). Asymptotic optimality of empirical likelihood for testing moment 
restrictions. Econometrica, 69(6), 1661-1672. 

Kitamura, Y. (2007). Empirical likelihood methods in econometric theory and practice. 
Cambridge University Press. 



36 MICHEL BRONIATOWSKI 1 AND AMOR KEZIOU 12 

Liese, F. and Vajda, I. (1987). Convex statistical distances, volume 95. BSB B. G. Teubner 

Verlagsgesellschaft, Leipzig. 
McCullagh, P. and Nelder, J. A. (1983). Generalized linear models. Monographs on 

Statistics and Applied Probability. Chapman & Hall, London. 
Newey, W. K. and Smith, R. J. (2004). Higher order properties of GMM and generalized 

empirical likelihood estimators. Econometrica, 72(1), 219-255. 
Owen, A. (1990). Empirical likelihood ratio confidence regions. Ann. Statist., 18(1), 

90-120. 

Owen, A. (1991). Empirical likelihood for linear models. Ann. Statist., 19(4), 1725-1747. 

Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. 
Biometrika, 75(2), 237-249. 

Owen, A. B. (2001). Empirical Likelihood. Chapman and Hall, New York. 

Pardo, L. (2006). Statistical inference based on divergence measures, volume 185 of Sta- 
tistics: Textbooks and Monographs. Chapman & Hall/CRC, Boca Raton, FL. 

Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. 
Ann. Statist, 22(1), 300-325. 

Rockafellar, R. T. (1970). Convex analysis. Princeton University Press, Princeton, N.J. 

Schennach, S. M. (2007). Point estimation with exponentially tilted empirical likelihood. 
Ann. Statist, 35(2), 634-672. 

Sheehy, A. (1987). Kullback-Leibler constrained estimation of probability measures. Re- 
port, Dept. Statistics, Stanford Univ. 

Smith, R. J. (1997). Alternative semi-parametric likelihood approches to generalized 
method of moments estimation. Economic Journal, 107, 503-519. 

van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge Series in Statistical and 
Probabilistic Mathematics. Cambridge University Press, Cambridge. 

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 
50(1), 1-25. 



ESTIMATION AND TEST UNDER MOMENT CONDITION MODELS 37 

1 LSTA, Universite Pierre et Marie Curie - Paris 6. E-Mail: michel.broniatowski@upmc.fr 
2 Laboratoire de Mathematiques de Reims, EA 4535, Universite de Reims Champagne- 
Ardenne. E-Mail: amor.keziou@upmc.fr 



